1United States Department of Agriculture—Agricultural Research Service—Plant Science Research, St.Paul, MN 55108, USA
2Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul,MN 55108, USA
3Department of Geography and Geology, University of Nebraska at Omaha, Omaha, NE68182, USA
4Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164,USA
5Department of Biological Systems Engineering, University of Wisconsin, Madison, WI 53706, USA
6Department of Agricultural Sciences, Fort Valley State University, Fort Valley, GA 31030, USA
7BiosciencesDivision and Center for Bioenergy Innovation, Oak Ridge National Laboratory, Oak Ridge, TN 37830, USA
8These authors contributed equally to this work
Received 28 Mar 2024 |
Accepted 20 Aug 2024 |
Published 11 Sep 2024 |
Background: Root system architecture (RSA) is of growing interest in implementing plant improvements with belowground root traits. Modern computing technology applied to images offers new pathways forward to plant trait improvements and selection through RSA analysis (using images to discern/classify root types and traits). However, a major stumbling block to image-based RSA phenotyping is image label noise, which reduces the accuracies of models that take images as direct inputs. To address the label noise problem, this study utilized an artificial intelligence model capable of classifying the RSA of alfalfa (Medicago sativa L.) directly from images and coupled it with downstream label improvement methods. Images were compared with different model outputs with manual root classifications, and confident machine learning (CL) and reactive machine learning (RL) methods were tested to minimize the effects of subjective labeling to improve labeling and prediction accuracies. Results: The CL algorithm modestly improved the Random Forest model’s overall prediction accuracy of the Minnesota dataset (1%) while larger gains in accuracy were observed with the ResNet-18 model results. The ResNet-18 cross-population prediction accuracy was improved (~8% to 13%) with CL compared to the original/preprocessed datasets. Training and testing data combinations with the highest accuracies (86%) resulted from the CL- and/or RL-corrected datasets for predicting taproot RSAs. Similarly, the highest accuracies achieved for the intermediate RSA class resulted from corrected data combinations. The highest overall accuracy (~75%) using the ResNet-18 model involved CL on a pooled dataset containing images from both sample locations. Conclusions: ResNet-18 DNN prediction accuracies of alfalfa RSA image labels are increased when CL and RL are employed. By increasing the dataset to reduce overfitting while concurrently finding and correcting image label errors, it is demonstrated here that accuracy increases by as much as ~11% to 13% can be achieved with semi-automated, computer-assisted preprocessing and data cleaning (CL/RL).